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Abstract 

applications concern sparse signals, for example, detecting anomalies from the differences 
between consecutive images taken by surveillance cameras. This paper focuses on the problem of re- 
covering a A'-sparse signal x e M^'^^, i.e., K <^ N and X^iLi ^{^i 7^ 0} = I" the mainstream 
framework of compressed sensing (CS), the vector x is recovered from M non-adaptive linear measure- 
ments y = xS G R^^^^, where S 6 M^^*^ is typically a Gaussian (or Gaussian-like) design matrix, 
through some optimization procedure such as linear programming (LP). 

In our proposed method, the design matrix S is generated from an a-stable distribution with a « 0. 
Our decoding algorithm mainly requires one linear scan of the coordinates, followed by a few iterations 
on a small number of coordinates which are "undetermined" in the previous iteration. Our practical 
algorithm consists of two estimators. In the first iteration, the (absolute) minimum estimator is able to 
filter out a majority of the zero coordinates. The gap estimator, which is applied in each iteration, can 
accurately recover the magnitudes of the nonzero coordinates. Comparisons with two strong baselines, 
linear programming (LP) and orthogonal matching pursuit (OMP), demonstrate that our algorithm can 
be significantly faster in decoding speed and more accurate in recovery quality, for the task of exact 
spare recovery. Our procedure is robust against measurement noise. Even when there are no sufficient 
measurements, our algorithm can still reliably recover a significant portion of the nonzero coordinates. 

To provide the intuition for understanding our method, we also analyze the procedure by assuming 
an idealistic setting. Interestingly, when K ~ 2, the "idealized" algorithm achieves exact recovery with 
merely 3 measurements, regardless of N . For general K, the required sample size of the "idealized" 
algorithm is about bK. The gap estimator is a practical surrogate for the "idealized" algorithm. 

1 Introduction 



The goal of Compressed Sensing ( CS) IHHU is to recover a sparse signal x € M^^^ from a small number of 
non-adaptive linear measurements y = xS, (typically) by convex optimization (e.g., linear programming). 
Here, y G M^^^^ is the vector of measurements and S G M^^^^ is the design matrix (also called the 
measurement matrix). In classical settings, entries of S are i.i.d. samples from the Gaussian distribution 
A^(0, 1), or a Gaussian-like distribution (e.g., a distribution with finite variance). 

In this paper, we sample S from a heavy-tailed distribution which only has the A-th moment with A < a 
and we will choose a 0. Strikingly, using such a design matrix turns out to result in a simple and powerful 
solution to the problem of exact A'-sparse recovery, i.e., Xli^i 7^ 0} = I"^- 

'The results were presented in several seminars in 2012 and in the Third Conference of Tsinghua Sanya International Mathe- 
matics Forum (Jan. 2013) with no published proceedings. 
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1.1 Compressed Sensing 



Sparse recovery (compressed sensing), which has been an active area of research, can be naturally suitable 
for: (i) the "single pixel camera" type of applications; and (ii) the "data streams" type of applications. The 
idea of compressed sensing may be traced back to many prior papers such as ifTOH TllSl. 

It has been realized (and implemented by hardware) that collecting a linear combination of a sparse 
vector, i.e., y = xS, can be more advantageous than sampling the vector itself. This is the foundation of the 



"single pixel camera" proposal. See the site |https : //sites . google . com/site/igorcarron2/ 
[compres sedsensinghardware for a list of implementations of single-pixel-camera type of applica- 
tions. Figure [Uprovides an illustrative example. 

Natural images are in general not as sparse as the example in Figure [H We nevertheless expect that 
in many practical scenarios, the sparsity assumption can be reasonable. For example, the differences be- 
tween consecutive image/video frames taken by surveillance cameras are usually very sparse because the 
background remains still. In general, anomaly detection problems are often very sparse. 
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Figure 1: The task is to reconstruct a 256 x 256 image (i.e., N = 65536) with K = 852 nonzero pixels, using our 
proposed method with ]\I = i^log((A^ — K)/Q.Q\)/C,) measurements, for = 1, 3, 5, 10, 15. Our method, which 
is named "Min+Gap(3)" will be explained later in the paper. For C, = 1, our method is able to exactly reconstruct 
the image, using 11.63 seconds. Strikingly, even with = 15 (i.e., M = 891 measurements only), the reconstructed 
image by our method could still be informative. Note that, although natural images are often not as sparse as this 
example, sparse images are possible in important applications, e.g., the differences between consecutive image/video 
frames taken by surveillance cameras. 

Another line of applications concerns data streams, which can be conceptually viewed as a long sparse 
dynamic vector with entries rapidly varying over time. Because of the dynamic nature, there may not be 
an easy way of knowing where the nonzero coordinates are, since the history of streaming is usually not 
stored. Perhaps surprisingly, many problems can be fonnulated as sparse data streams. For example, video 
data are naturally streaming. A common task in databases is to find the "heavy-hitters" ||22| . e.g., finding 
which product items have the highest total sales. In networks fSTI . this is often referred to as the "elephant 
detection" problem; see some recent papers on compressed sensing for network applications Il20ll27ll28l . 
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For data stream applications, entries of tiie signals x are (rapidly) updated over time (by addition and 
deletion). At a time t, the iftYi entry is updated by It, i.e., Xjj — > Xi^ + It- This is often referred to as the 
turnstile model |j22l. As the projection operation is linear, i.e., y = xS, we can (re)generate corresponding 
entries of S on-demand whenever one entry of x is altered, to update all entries of the measurement vector 
y. The use of stable random projections for estimating the a-th frequency moment J2iLi (instead 
of the individual temis Xi) was studied in lITSl . ifTTl proposed the use of geometric mean estimator for 
stable random projections, for estimating YliLi l^il" well as the hamionic mean estimator for estimating 
when a 0. When the streaming model is not the turnstile model (for example, the update 
mechanism may be nonlinear), lITSl developed the method named conditional random sampling (CRS), 
which has been used in network applications ||30l . At this point, our work focuses on the turnstile data 
stream model. 

In this paper, our goal is to use stable random projections to recover the individual entries Xi's, not just 
the summary statistics such as the frequency moments. We first provide a review of a-stable distributions. 



1.2 Review of a-Stable Distribution 

A random variable Z follows an a-stable distribution with unit scale, denoted by S{a, 1), if its characteristic 
function can be written as ll24l 

^/gv^zA g-|tr^ 0<a<2 (1) 



When a = 2, S{2, 1) is equivalent to the normal distribution with variance 2, i.e., A^(0, 2). When a = 1, 
S{1, 1) is the standard Cauchy distribution centered at zero with unit scale. 

To sample from S{a, 1), we use the CMS procedure |'3|. That is, we sample independent exponential 
w ~ exp{l) and uniform u ~ nm/(— 7r/2, 7r/2) variables, and then compute Z ~ S{a, 1) by 



Z 



sin(an) rcos(w — au) 



(cos uy/'^ 



w 



{l-a)/a 



5(a,l) (2) 



If 5i, 52 ~ 5(q, 1) i.i.d., then for any constants Ci, Ca, we have CiSi + C2S2 = S x (|C7i|" + jCaT)^/", 
where S ~ S{a, 1). More generally, XiSi = S x 

In this paper, we propose using a w 0. In our numerical experiments with Matlab, the value of a is 
taken to be 0.03 and no special data storage structure is needed. While the precise theoretical analysis based 
on a particular choice of (small) a is technically nontrivial, our algorithm is intuitive, as illustrated by a 
simpler analysis of an "idealized" algorithm using the limit of a-stable distributions as a — )• 0. 



1.3 The Proposed Practical Recovery Algorithm 

We assume x € M^^^ is i^-sparse and we do not know where the nonzero coordinates are. We obtain M 
linear measurements y = xS € R^^*^, where entries of S € R^^^^, denoted by Sij, are i.i.d. samples from 
S{a, 1) with a small a (e.g., 0.03). That is, each measurement is yj = ^i^ij- Our algorithm, which 

consists of two estimators, utilizes the ratio statistics Zij = yj/sij, j = 1, 2, M, to recover Xj. 
The absolute minimum estimator is defined as 

Xi,min = Zi^t, where i = argmin ztj = ^ (3) 

which is effective for detecting whether any Xi = 0. In fact we prove that essentially Mq = K log ((A^ — K) / 5) 
measurements are sufficient for detecting all zeros with at least probability 1 — 5. The actual required num- 
ber of measurements will be significantly lower than Mq if we use the minimum algorithm together with the 
gap estimator and the iterative process. 
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Algorithm 1 The proposed recovery algorithm. 



Input: A'-sparse signal x e R^^^, threshold e > (e.g., 10 design matrix S G K^x*f sampled from 5(a, 1) 
with a ?» (e.g., 0.03). S can be generated on-demand when the data arrive at a streaming fashion. 

Output: The recovered signal, denoted by x^, i = 1 to A^. 

Linear measurements: y = xS, which can be conducted incrementally if entries of x arrive in a streaming fashion. 

Detection: For i = 1 to A^, compute ii.mm = Vtl su, where t = argmin^ | i/j /s^^ | . If |ii.mm| ^ set sti = 0. 

Estimation: If \xi^min\ > compute the gaps for the sorted observations yj/sij and estimate Xi using the gap 
estimator Xi,gap- Let Xi = Xi^gap- See the details below. 

Iterations: If |ii.miri| > e and the minimum gap length > e, we call this i an "undetermined" coordinate and set 
Xi = 0. Compute the residuals: r = y — xS, and apply the gap estimator using the residual r, only on the set of 
"undetermined" coordinates. Repeat the iterations a number of times (e.g., 2 to 4) until no changes are observed. This 
iteration step is particularly helpful when M <C Mq = K log((A^ — K) / 5) where 5 = 0.05 or 0.01 . 



When |xj^mj„| > e, in order to estimate the magnitude of Xi , we resort to the gap estimator defined 
as follows. First we sort Zjj's and write them as order statistics: (i) < Zi^{2) < ... < Zi^(^M)- Then we 
compute the gaps: gj = Zj q+i) — 1 < i < M — 1. The gap estimator is simply 

Xi,gap = ]:{zi^{j,) + , ji= argmin gj = Zi^^j+i) - Zi^(^j) (4) 

We have also derived theoretical eiTor probability bound for Pr {\xi^gap — Xi\ > e). When M < Mq, we 
discover that it is better to apply the gap estimator a number of times, each time using the residual measure- 
ments only on the "undetermined" coordinates; see Alg. [T] The iteration procedure will become intuitive 
after we explain the "idealized algorithm" in Sec.|2]and Sec. [5] 

Note that our algorithm does not directly utilize the information of K. In fact, for small a, the quantity 
0" (which is very close to K) can be reliably estimated by the following harmonic mean estimator ||T7| : 

-|r(-„)sinf. / ^ /-.r(-2a)si„(..) _ \\ 

Ej=iJ^ V V [r(-a)smfa] // 

2 Intuition 

While a precise analysis of our proposed method is technical, our procedure is intuitive from the ratio of 
two independent a-stable random variables, in the limit when a — 0. 

Recall that, for each coordinate i, our observations are [yj, sij), j = 1 to M. Naturally our first attempt 
was to use the joint likelihood of (y^ , Sij). However, in our proposed method, the observations ai"e utilized 
only through the ratio statistics y^j Sij. We first explain why. 

2.1 Why Using the Ratio Statistics Vj/sijl 

For convenience, we first define 
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Denote the density function of S{a, 1) by fs{s). By a conditional probability argument, the joint density 
of {uj, Sij) can be shown to be ■ff:fs{sij)fs J^'^'^ ^ , from which we can derive the joint log-likelihood 

of {yj,Sij),j = ItoM, as 



M M 

l{xi,Oi) = ^\og fs{sij) + ^\og fs 
i=i i=i 



Uj XiSij 



(7) 



Here we can treat Xi and Oi as the parameters to be estimated. Closed-form density functions of fs are in 
general not available (except for a = 2 and a = 1). Interestingly, when a « 0, we can obtain a convenient 
approximation. Recall, for two independent variables w ~ exp{l) and u ~ unif{—TT /2, vr/2), we have 



sin(au) rcos(u — an) 



w 



(l-a)/a 



S{a, 1), 



(cos u)^/° 

Thus, it is intuitive that l/\Z\°' is approximately w ~ exp{l) when a 0. As rigorously shown by ||6], 
1/|Z|" — >■ exp(l) in distribution. Using this limit, the density function fs{s) is approximately f ^|7pq^> 
and hence the joint log-likelihood /(xj, 6i) is approximately 

M AI 



i{xi,ei) ^\og fs{sij) + 



I'^IJ I 



(a + 1) log \yj - XiSij \} +aMlogei + M log ( - 



a 



which approaches infinity (i.e., the maximum likelihood) at the poles: yj — XiSij = 0, j = 1 to M. This is 
the reason why we use only the ratio statistics Zij = yj/sij for recovering Xj. 

2.2 The Approximate Distribution of yj / Sij 

Since our procedure utilizes the statistic yj/sij, we need to know its distribution, at least approximately. 



EN 

Recall the definition 6.: = 



Note that ^ 



T,t^t-^tst] _|_ _ ^^52 _|_ ^.^ where Si and S2 are i.i.d. S{a, 1) variables. 



'-Si 

J2t^i \xt\°'f^"- Thus, Pr <tj = Pr (^§^ < and the problem 

boils down to finding the distribution of the ratio of two a-random variables with a 0. Using the limits: 
l/|5'i|'^ — > exp{l) and 1/|5'2|" exp{l), as a — )• 0, it is not difficult to show an approximate cumulative 
distribution function (CDF) of yj / Sij : 



Pr(^<tUPr(|^<^ 



21- 



21+ 



t < Xi 



t > Xi 



(8) 



The CDF of 52/51 is the also given by ^ by letting Xi = and Oi = 1. 

Figure |2]plots the approximate CDFs dSjl for S2/S1 (left panel) and yj/sij (right panel, with Xj = and 
three values of 0"). While the distribution of S2/S1 is extremely heavy-tailed, about half of the probability 
mass concentrated near 0. This means, as a — )• 0, samples of |52/5i| are equal likely to be either very close 
to zero or very large. Since dS) is only approximate, we also provide the simulations of ^2 / Si in Figure [3] 
to help verify the approximate CDF in Figure |2] 
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Figure 2: Approximate CDFs of S2/S1 (left panel) and (right panel) as given by (O. The CDF of S2/S1 is 
very heavy-tailed with an essentially vertical jump around 0. This means, samples of S2/S1 are likely to be either 
very close or very large (approaching ±00). The CDF of yj/sij is a scaled (and shifted) version of the CDF of 
S2/S1, with a almost vertical jump around the true Xi . This special structure motivates us to develop the gap estimator 
for Xi. Suppose we have M observations of yj/sij, j = 1 to AI. Observations outside {xi — e,Xi + e) for very 
small e will likely be far away from each other because the distribution is extremely heavy-tailed. Observations 
within {xi — e,Xi + e) are extremely close to each other, which will be used for identifying the true Xi because these 
observations cluster around Xi. 
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Figure 3: Simulations of 1 6*2 /S*! | directly using the formula ([2) for generating two independent a-stable variables Si 
and 5*2. With a — > 0, it is clear that most of the samples are either very close to (< 10"^) or very large (> 10^). 
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2.3 An Example with K = 2io Illustrate the "Idealized" Algorithm 

To illustrate our algorithm, in particular the iterative procedure in Alg.[T] we consider the simplest example 
of K = 2. Without loss of generality, we let xi = X2 = I and Xi = 0,\/ 3 < i < N. This way, our 
observations ai'e yj = xisij + X2S2j = sij + S2j for j = 1 to M. The ratio statistics are 

= Vj/sij = 1 + — 

Slj 

Z2,j = yj/s2j = 1 + 

S2j 

Zi,j ~ Vj/^ij ~ ~ ~l ~^ i > 3 
Sij Sij 

We assume an "idealized" algorithm, which allows us to use an extremely small a. As a ^ 0, — is 
either (virtually) or ±00. Note that ^ is the reciprocal of i.e., ^ <J=^ ^ « ±00. 
Suppose, with M = 3 observations, the ratio statistics, for f = 1, 2, are: 

{zi,i,Z2,i) = {l,±oo), (21,2,^2,2) = (±00, 1), (2;i,3,2;2,3) = (l,±oo) 

Then we have seen zij = 1 twice and we assume this "idealized" algorithm is able to correctly estimate 
£1 = 1, because there is a "cluster" of I's. After we have estimated xi, we compute the residual rj = 
Uj — xisij = S2j- In the second iteration, the ratio statistics become 

rj/s2j = 1 

"^j/sij = ^) i > 3 

•Sij 

This means we know X2 = 1- Next, we again update the residuals, which become zero. Therefore, in the 
third iteration, all zero coordinates can be recovered. The most exciting part of this example is that, with 
M = 3 measurements, we can recovery a signal with K = 2, regardless of N. When K > 2, the analysis 
of the "idealized" algorithm requires a bit more work, which we present in Sec. |5] 
We summarize the "idealized" algorithm (see more details in Sec.|5]) as follows: 

1. The algorithm assumes a — > 0, or as small as necessary. 

2. As long as there are two observations Vj/sij in the extremely naiTow intei^val (xj — e,Xi + e) with e 
very close to 0, the algorithm is able to con^ectly recover Xi. We assume e is so small that it is outside 
the required precision range of Xj. Here we purposely use e instead of e to differentiate it from the 
parameter e used in our practical procedure, i.e., Alg. [1] 

Cleaiiy, this "idealized" algorithm can not be strictly faithfully implemented. If we have to use a small 
a instead of a = 0, the observations \yj/ Sij\ will be between and 00, and we will not be able to identify 
the true Xj with high confidence unless we see two essentially identical observations. This is why we specify 
that an algorithm must see at least two repeats in order to exactly recover Xi. As analyzed in Sec. [6l the 
proposed gap estimator is a practical surrogate, which converges to the "idealized" algorithm when a — 0. 

2.4 The Intuition for the Minimum Estimator and the Gap Estimator 

As shown in Figure |2] (right panel), while the distribution of Vj/ Sij is heavy-tailed, its CDF has a significant 
jump very near the true xi. This means more than one obsei-vations (among M observations) will likely lie 
in the extremely naiTow interval around Xj, depending on the value of 0" (which is essentially K). We are 
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able to detect whether Xj = if there is just one observation near zero. To estimate the magnitude of Xi, 
however, we need to see a "cluster" of observations, e.g., two or more observations which are essentially 
identical. This is the intuition for the minimum estimator and the gap estimator. Also, as one would expect. 
Figure |2] shows that the performance will degrade (i.e., more observations are needed) as Of increases. 

The gap estimator is a practical surrogate for the "idealized" algorithm. Basically, for each i, if we 
sort the observations: (i) < (2) < ••• ^ ^j,(Af)' the two neighboring observations coixesponding to the 
minimum gap will be likely lying in a nanow neighborhood of Xj, provided that the length of the minimum 
gap is very small, due to the heavy concentration of the probability mass about Xj. 

If the observed minimum gap is not small, we give up estimating this ("undermined") coordinate in the 
current iteration. After we have removed the (reliably) estimated coordinates by computing the residuals, 
we may have a better chance to successfully recover some of these undermined coordinates because the 
effective "ivT" and the effective "A^" are significantly reduced. 

Finally, to better understand the difference between the minimum estimator and the gap estimator, we 
compute (assuming x-i > 0) the probability Pr {\yj/sij \ < x-i — e), which is related to the probability that 
the two estimators differ, i.e., |xi^gap| > |a^i,mm|- Using the approximate CDF of yj/sij ©, we have 

^ / , N 1/2 1/2 

Pr [\yj/sij\ < Xi-e) - 



l + e'^/K l + {2xi)°'/K 

When a = 0.03, Xi = 10, e = 10"^, K = 100, this probability is about 0.002, which is not small 
considering that we normally have to use at least M = 5K measurements. Given enough measurements, 
almost certainly the minimum estimator will be smaller than the gap estimator in absolute values, as verified 
in the simulations in Sec. |4l In other words, the minimum estimator will not be reliable for exact recovery. 
This also explains why we need to see a clustered observations instead of relying on the absolute minimum 
observation to estimate the magnitude. 



2.5 What about the Sample Median Estimator? 

Since the distribution of yj/sij is symmetric about the true Xj, it is also natural to consider using the sample 
median to estimate Xj. We have found, however, at least empirically, that the median estimator will require 
significantly more measurements, for example, M > 5Mq or more. This is why we develop the gap 
estimator to better exploit the special structure of the distribution as illustrated in Figure |2] 



3 Two Baselines: LP and OMP 

Both linear programming (LP) and orthogonal matching pursuit (OMP) utilize a design matrix sampled 
from Gaussian (i.e., a-stable with a = 2) or Gaussian-like distribution (e.g., a two-point distribution on 
{— 1, +1} with equal probabilities). Here, we use S(2) to denote such a design matrix. 

The well-known LP algorithm recovers the signal x by solving the following li optimization problem: 

min|tx||i subject to y = xS(2) (9) 

which is also commonly known as Basis Pursuit ||4l. It has been proved that LP can recover x using 
M = 0{K\og{N/K)) measurements |QT1, although the exact constant is unknown. This procedure is 
computationally prohibitive for large A^ (e.g., A^ = 10^). When there are measurement noises, the LP 
algorithm can be modified as other convex optimization problems, for example, the Lasso algorithm ll25l . 

The orthogonal matching pursuit (OMP) algorithm flT] is a popular greedy iterative procedure. It typi- 
cally proceeds with K iterations. At each iteration, it conducts univariate least squares for all the coordinates 
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on the residuals, and chooses the coordinate which maximally reduces the overall square eiTors. At the end 
of each iteration, all the chosen coordinates are used to update the residuals via a multivariate least square. 
The algorithm can be coded efficiently (e.g., in Matlab) (But we find OMP is still significantly slower than 
our method especially when K is not small.) |[29l [T2l showed that, under appropriate conditions, the re- 
quired number of measurements of OMP is essentially 0{K \og{N — K)), which improved the prior result 
in im. There are also modified OMP algorithms, e.g., CoSaMP ll23l . 

In this paper, our experimental study will focus on the comparisons with OMP and LP, as these two 
methods are the most basic and still strong baselines. Of course, we recognize that compressed sensing is a 
rapidly developing area of research and we are aware that there are other promising sparse recovery methods 
such as the "message-passing" algorithm |'9l and the "sparse matrix" algorithm llT3l . We plan to compare 
our algorithm with those methods in separate future papers. 

4 Simulations 

To validate the procedure in Alg. [T] we provide some simulations (and comparisons with LP and OMP), 
before presenting the theory. In each simulation, we randomly select K coordinates from a total of N coor- 
dinates. We set the magnitudes (and signs) of these K coordinates according to one of the two mechanisms, 
(i) Gaussian signals: the values are sampled from Normal{0, 5^). (ii) Sign signals: we simply take the 
signs, i.e., { — 1,0, 1}, of the generated Gaussian signals. The number of measurements M is chosen by 

M = Mo/C, Afo = K log {{N - K)/6) (10) 

where 5 = 0.01 and C G {1,1.3,2,3,4,5}. When M = Mq, all methods perform well in terms of 
accuracies; and hence it is more interesting to examine the results when M < Mq. Also, we simply fix 
e = 10"^ in Alg[T] Our method is not sensitive to e (as long as it is small). We will explain the reason in the 
theory section. 

4.1 Sample Instances of Simulations 

Figures |4] to |9] present several instances of simulations, for N = 100000 and K = 30. In each simulation 
(each figure), we generate the heavy-tailed design matrix S (with a = 0.03) and the Gaussian design matrix 
S(2) (with a = 2), using the same random variables (w's and n's) as in Q. This provides shoulder-by- 
shoulder comparisons of our method with LP and OMP. We use the Zi -magic package |1| for LP, as we 
find that -magic produces very similar recovery results as the Matlab build-in LP solver and is noticeably 
faster. Since -magic is popular, this should facilitate reproducing our work by interested readers. 

In Figures ID and m we let M = Mq (i.e., C = 1)- For this M, all methods perform well, for both 
sign signal and Gaussian signal. The left-top panels of Figures |4] and [5] show that the minimum estimator 
S:i,min can precisely identify all the nonzero coordinates. The right-top panels show that the gap estimator 
S^i.gap applied on the coordinates identified by Xi^min, can accurately estimate the magnitudes. The label 
"min-i-gap(l)" means only one iteration is performed (which is good enough for ( = 1). 

The bottom panels of Figures|4]and|5]show that both OMP and LP also peifomi well when C = 1- OMP 
is noticeably more costly than our method (even though K is small) while LP is significantly much more 
expensive than all other methods. 

We believe these plots of sample instances provide useful information, especially when M <^ Mq. If 
M is too small, then all methods will ultimately fail, but the failure patterns are important, for example, a 
"catastrophic" failure such that none of the reported nonzeros is correct will be very undesirable. Figure |8] 
and Figure|9]will show that our method does not fail catastrophically even with M = Mo/5. 
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Figure 4: Reconstruction results from one simulation, using N = 100000, K ~ 30, M = Mq (i.e., ( = 1), and sign 
signals. The reconstructed signals are denoted by (red) circles. The minimum estimator (left-top) is able to identify all 
nonzero coordinates with no false positives, using only 0.55 seconds. With one iteration of the gap estimator (right- 
top), we can perfectly reconstruct the signal using additional 0.03 seconds (so the total time is 0.58 seconds). The 
OMP and LP algorithms (bottom panels) also perform well at significantly higher computational costs. 
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X 10 

Mq, and Gaussian signals. 



10 



Simulations in Figures |6] and |7] use M = Mo/3 (i.e., C, = 3). The minimum estimator xi^min outputs 
a significant number of false positives but our method can still perfectly reconstruct signal using the gap 
estimator with one additional iteration (i.e., Min+Gap(2)). In comparisons, both LP and OMP perform 
poorly. Furthermore, Figures |8] and |9] use M = M^/b (i.e., C = 5) to demonstrate the robustness of our 
algorithm. As M is not large enough, a small fraction of nonzero coordinates are not recovered by our 
method, but there are no catastrophic failures. This point is of course already illustrated in Figure [T] In 
comparison, when M = Mq/S, both LP and OMP perform very poorly. 

Note that the decoding times for our method and OMP are fairly consistent in that smaller M results 
in faster decoding. However, the run times of LP can vary substantially, which should have to do with 
the quality of measurements and implementations. Also, note that we only display the io^-K (in absolute 
values) reconstructed coordinates for all methods. 
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Figure 6: Reconstruction results from one simulation, with N = 100000, K = 30, M = Mo/3 (i.e., C, = 3), and sign 
signals. Many of the false positives produced by the min estimator are removed by the gap estimator after 1 iteration. 
The signal is perfectly reconstructed after the second iteration. In comparisons, both OMP and LP perform poorly. 
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Figure 7: Reconstruction results from one simulation, using A'' = 100000, K = 30, M = Ma/ 3 (i.e., ( = 3), and 
Gaussian signals. Our method (using two iterations) can still perfectly reconstruct the signal. 
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Figure 8: Reconstruction results from one simulation, with N ~ 100000, K = 30, M = Mo/5 (i.e., C = 5), and sign 
signals. Since M is not large enough, a small fraction of the nonzero coordinates are not reconstructed by our method. 
In comparisons, both OMP and LP perform very poorly in that none of the reported nonzero coordinates is correct. 
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Figure 9: Reconstruction results from one simulation, using N = 100000, K = 30, M = Mo/5 (i.e., C, = 5), and 
Gaussian signals. Again, since M is not large enough, a small fraction of the nonzero coordinates are not reconstructed 
by our method. In comparisons, both OMP and LP perform very poorly. 
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Finally, Figure [TO] supplements the example in Figure [T] (which only displays "Min+Gap(3)") by pre- 
senting the results of "Min+Gap(l)" and "Min+Gap(2)", to illustrate that our proposed iterative procedure 
improves the quality of reconstructions. 
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Figure 10: This is the continuation of the example in Figure[T] for M = Mo/3 (top panels) and M = Mo/5 (bottom 
panels), to demonsn^ate that the iterative procedure improves the quality of signal reconstructions. 



4.2 Summary Statistics from Simulations 

We repeat the simulations many times to compare the aggregated reconstructed errors and run times. In 
this set of experiments, we choose (iV, K) from {(5000, 50), (10000, 50), (10000, 100), (100000, 100)} 
combinations. We choose M = Mq / ( with € {1,1.3,2,3,4, 5}. We again experiment with both Gaussian 
Nor7nal{0, 5^) signals and sign signals. 

For each setting, we repeat the simulations 1000 times, except {N, K) = (100000, 100), for which we 
only repeat 100 times as the LP experiments take too long. 



4.2.1 Precision and Recall 

For sparse recovery, it is crucial to correctly recover the nonzero locations. Here we borrow the concept of 
precision and recall from the literature of information retrieval (IR): 

# True Nonzeros tp „ # True Nonzeros tp 

Precision = : = — , Recall = — 



# Returned Nonzeros tp + fp # Total True Nonzeros tp + fn 

to compare the proposed absolute minimum estimator with LP decoding. Here, we view nonzero coordinates 
as "positives" (p) and zero coordinates as "negatives" (n). Ideally, we hope to maximize "true positives" (tp) 
and minimize "false positives" (fp) and "false negatives" (fn). In reality, we usually hope to achieve at least 
perfect recalls so that the retrieved set of coordinates contain all the true nonzeros. 



15 



Figure [TT]presents the (median) precision-recall curves. Our minimum estimator always produces essen- 
tially 100% recalls, meaning that the true positives are always included for the next stage of reconstruction. 
In comparison, as M decreases, the recalls of LP decreases significantly. 





^ (# Measurements = M^Q ^ (# Measurements = M^/Q 




^ (# Measurements = M^/Q ^ (# Measurements = M^/Q 



Figure 11: Median precision and recall curves, for comparing our proposed minimum estimator with LP decoding. 
The minimum estimator produces essentially 100% recalls even for M as small as Mo/5. 
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4.2.2 Reconstruction Accuracy 



The reconstruction accuracy is another useful measure of quality. We define the reconstruction error as 



Note that, since errors are normalized, a value > 1 should indicate a very bad reconstruction outcome. 

Figure [12] presents the median reconstniction errors. At M = Mq (i.e., ( = 1), all methods perform 
well. For sign signals, both OMP and LP perform poorly as soon as ^ > 1-3 or 2 and OMP results are 
particularly bad. For Gaussian signals, OMP can produce good results even when C = 3. 

Our method performs well, and 2 or 3 iterations of the gap estimation procedure help noticeably. One 
should keep in mind that errors defined by (fTTT i may not always be as informative. For example, with 
M = Mo/5, Figures [8] and |9] show that, even though our method fails to recover a small fraction of nonzero 
coordinates, the recovered coordinates are accurate. In comparison, for OMP and LP, essentially none of the 
nonzero coordinates in Figures |8]and|9]could be accurately identified when M = Mo/5. This confirms that 
our method is stable and reliable. Of course, we have already seen this behavior in the example in Figure [T] 
In that example, even with M k, K, the reconstructed signal by our method is still quite informative. 

4.2.3 Reconstruction Time 

Figure [T3] confirms that LP is computationally expensive, using the /i-magic package HI. We have also 
found that the Matlab build-in LI solver can take significantly more time (and consume more memory) than 
the Zi-magic package. In comparison, OMP is substantially more efficient than LP, although it is still much 
more costly than our proposed algorithm, especially when K is not small. In our experiments with the data 
for generating Figured] OMP was more than 100 times more expensive than our method. 

4.3 Measurement Noise 

Our proposed algorithm is robust against measurement noise. Recall = Vj/sij = Xi + 9i^. With 
additive noise n, we have ^^^^ = Xi + 0iS2/Si + n/Si. Since the observations are only useful whenS'2/S'i 
is essentially 0, i.e., 5i is large, n/Si does not really matter for our procedure. We will provide more results 
about measurement noise in Sec.|7] 
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Figure 12: Median reconstruction errors (ITTi . for comparing our proposed algorithms with LP and OMR When the 
number of measurements M = Mq, all methods perform well. As M decreases, the advantage of our proposed 
algorithm becomes more obvious, especially with 2 or 3 iterations. Note that, with M = Mo/5, even though the error 
of our method in terms of (fTll is quite large, the error comes from the small fraction of coordinates which our method 
"gives up". The reported nonzero coordinates by our method are still very accurate. See Figure |8]and Figure 
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Figure 13: Median reconstruction times, for comparing our proposed algorithm with OMP and LP. When K is not 
small, our method can be significantly more efficient than OMP. LP is very expensive. Note that in these experiments, 
K is not too large, otherwise the computational advantages over LP and OMP will be even more significant. 
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5 Analysis of the "Idealized" Algorithm 



The analysis of our practical algorithm, i.e., the gap estimator and iterative procedure, is technically non- 
trivial. To better understand our method, we first provide the analysis of the "idealized" algorithm. 



5.1 Assumptions 

The "idealized" algorithm makes the following three major assumptions: 

1. The coordinate Xi is perfectly estimated (i.e., effectively zero error) if the estimate Xi satisfies \xi — 
Xi\ < e, for small e. This assumption turns out to be realistic in practice. For example, if Xi can only 
be either or integers, then e < 0.1 (or even e < 0.5) is good enough. If we know that the signal 
comes from a system with only 15 effective digits, then e < 10^^^ is good enough (for example, in 
Matlab "(l+le-16)-l==0"). Here, we do not try to specify how e is determined; we can just think of 
it as a small number which physically exists in every application. 

2. The algorithm is able to use a — for generating stable random variables and no numerical errors 
occur during calculations. This convenient assumption is of course strong and not truly necessary. 
Basically, we just need a to be small enough so that |xj — Xi\ < e. For the sake of simplifying the 
analysis, we just assume a — > 0. 

3. As long as there are at least two observations yj/sij in the interval (xj — e,Xi + e), the algorithm 
is able to perfectly recover Xj. This step is subtly different from the gap estimator in our practical 
procedure. Because we do not know Xi in advance, we have to rely on gaps to guess Xi. Although the 
distribution of yj/sij is extremely heavy-tailed (as shown in Figure |2]i, there is always an extremely 
small chance that two nearly identical observations reside outside {xi — e,Xi + e). Of course, with a 
decreases, the difference between the gap estimator and the "idealized" algorithm diminishes. 



5.2 Success Probability of Each Observation 

With the above assumptions, we ai^e able to analyze the "idealized" algorithm. First, we define the success 
probability for each observation: 

Pe = lim Pr {\yj/sij - Xi\ < e) (12) 

a— >0 

l/o 



As shown in Sec. |2l yj/sij = Xj + 0182/ Si, where 81,82 ~ 8[a,l) i.i.d. and 6i = (^X^f^j \xt\ 

Thus, Pe = linia^oPr (|5'2/S'i| < e/Oi). Recall l/\8if ^ -uji ~ exp(l) and l/\82f ^ 1(^2 ~ exp(l), 
as a — )• 0. In this section, for notational convenience, we simply write l/|5i|" = wi and 1/|52|" = W2- 
Therefore 

Pe = lim Pr {\82/8i \ < e/Oi) = lim Pr {wi/w2 < e°/^f) = lim ^ 



a^O - -1"^/ ^^0 l + l/(e"/0f) 

1 if X,- = ^^^^ 



K+l- 



This means the number of observations falling in (xj — e, Xj + e) follows a binomial distribution {M,pe). 
We can bound the failure probability (i.e., the probability of having at most one success) by 6, as 



1 - pe)^' + M (1 - pe)^'~^ Pe < 6 (14) 
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To make sure that Xi can be perfectly recovered, we need to choose M large enough such that 

+ {l-l/K)^^~'^ M/K <5 (15) 

Clearly, M = 2K log 1/5 is sufficient for any 6 G (0, 1). But we can do better for small 6: 

M > 1.60i^logl/5, if 5 < 0.05 (16) 
M > 1.45/s:iogl/5, if 5 < 0.01 (17) 

which is 4.8K when 6 = 0.05, and about 6.7K when 6 = 0.01. 
5.3 The Iterative Procedure 

There are N coordinates. If we stop the algorithm with only one iteration, then we have to use the union 
bound which will result in an additional log N multiplicative term. However, under our idealistic setting, 
this log N term is actually not needed if we perform iterations based on residuals. Each time, after we re- 
move a nonzero coordinate which is perfectly recovered, the remaining problem only becomes easier, i.e., 
the success probability becomes > and M remains the same. 

Suppose, xi = X2 = ... = xk = 1 and Xi = if K < i < N. The ratio statistics are, for j = 1 to M, 
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Suppose in the first iteration, xi is perfectly recovered. That is, among M observations, for at least two 



observations, zij is very close 1, i.e.. 



£2i _|_ £3i _)_ _|_ Mi 



< e, for at least j values. 



After we recover xi, we compute the residual and move to the second iteration: 

1 , •^3i , , SKj 
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We need to check to make sure that we have an easier problem in that the success probability is at least 1 / K. 
It turns out this probability is at least 1/{K — 1), which is even better of course. To see this, we consider 
two cases, depending on whether in the first iteration zij is a success. 
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Case 1: In the first iteration, the observation zij is a success. 
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Case 2: In the first iteration, the observation zi j is not a success. 
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In both cases, the success probability in the second iteration increases (i.e., larger than 1/K). Note that, 
assuming xi = a;2 = ... = xk = 1 is for notational convenience. As long as a 0, the same result holds. 



5.4 A Simplified Analysis for the Total Required Measurements 

The above analysis provides the good intuition but it is not the complete analysis for the iterative procedure. 
Note that the total number of iterations m satisfies < 1, i.e., m > log K/ log 1/5. A precise analysis of 
the iterative procedure involves calculating the conditional probability. For example, after the first iteration, 
n (out of A^) nonzero coordinates are recovered. A rigorous analysis of the success probability for the 
second iteration will require conditioning on all these n recovered coordinates and the remaining N — n 
unrecovered coordinates. Such an analysis may add compUcation for understanding our method. 
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Here, for simplicity, we assume that in each iteration, we use "fresh" projections so that the iterations 
become independent. This way, the total number of required measurements becomes (assuming 5 < 0.05): 

1.6Klogl/6 + 61.6K\ogl/6 + 6^1.6Klogl/6 + ... < -^l.QKlogl/6 (18) 

1 — 

The additional multiplicative factor has little impact on the overall complexity. When 6 = 0.05, the 
required number of measurements becomes 5.0K instead of 4.8A'. The result is still highly encouraging. 



6 Analysis of the Practical Algorithm 

This section will develop a more formal theoretical analysis of our procedure in Alg. [T] including the mini- 
mum estimator and the gap estimator. The gap estimator is a surrogate for the "idealized" algorithm. 

The minimum estimator is not crucial once we have the gap estimator and the iterative process. We 
keep it in our proposed procedure for two reasons. Firstly, it is faster than the gap estimator and is able to 
identify a majority of the zero coordinates in the first iteration. Secondly, even if we just use one iteration, 
the required sample size for the minimum estimator M is essentially K log N/ 6, which already matches the 
known complexity bounds in the compressed sensing literature. 



6.1 Probability Bounds 

Our analysis uses the distribution of the ratio of two independent stable random variables, 81,82 ~ 8{a, 1). 
As a closed-form expression is not available, we compute the lower and upper bounds. First, we define 

F^{t) = Pr (|52/5ir/('^") < t) , t > (19) 

where 



W2 



a/(l—a) ■ I \ 

' = 77— [cos(n - au)p " 



based on the CMS procedure (|2]i for generating a-stable random variables. The following lemmas provide 
several useful bounds for -FQ,(t). 

Lemma 1 For all t > 0, 

^' \l + Qa/tJ- ll + l/t l + l/t J 



In particular, when t < 1/3, we have 



In addition, for any fixed t >0, 



Fait) > 3^^, t < 1/3 (21) 



limFjt) = 3_ (22) 



Proof: See Appendix^ □ 
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Lemma 2 If < a < 1/3, then 



where 



Fait) <Cat^+'^ max{l,ti+-} 



^ 1 , ,^ 1^ /l-a\ i+- /1 + a 

= /iiAi2 + - (Ai2(l - hi 

vr \ a J VI — a 



1 r (1/(2 - 2a)) r ((1 - 3a)/(2 - 2q)) 
vr 



r((2 - 3a)/(2 - 2a)) 
/i2 = 1/ COS (7ra/(2 — 2a)) 

Ca^l + l/TTasa^O,Ca< 1.5 r/a < 0.05, and C« < 2 /fa < 0.16. 
Proof: See Appendix\B\ Fisure\14\plots Cafora G [0,0.3]. 



(23) 

(24) 

(25) 
(26) 

□ 




0.05 0.1 0.15 0.2 0.25 0.3 



Figure 14: The constant Cq as defined in 



Lemma 3 For all < s < t, 

Fait) - Fais) < (1 - s/t)Fait) < (i/s " l)i^a(s) 

Proof: For allO <s <t, we have Fait)/t = E (^j:^^ < Fais)/s = E [j:^^ 

Figure [T5]plots the simulated Fait) curves together with the upper and lower bounds. 
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t 



(27) 

□ 



Figure 15: Fait) obtained by simulations (solid curves), for a — 0.03 and a — 0.1. In each panel, the bottom and top 
curves are the lower bound (|20^ and upper bound (|23), respectively. The lower bound is sharp, especially for small a. 
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6.2 Analysis of the Absolute Minimum Estimator 

Recall the definition of the absolute min estimator: 



yt_ 

Sit 



where t = argmin 

i<i<A/ 



(28) 



If \xi,min\ > e» then we consider the i-th coordinate is a (candidate of) nonzero entry. Our task is to analyze 
the probability of false positive, i.e., Pr (|xi mm| > e,Xi = 0), and the probability of false negative, i.e., 
Pr (|xi,mm| < > e). Again, we should keep in mind that, in the proposed method, i.e., Alg.[T] the 

minimum estimator is merely the first crude step for filtering out many true zero coordinates. False positives 
will have chance to be removed by the gap estimator and iterative process. 

6.2.1 Analysis of False Positives 

Theorem 1 Assume = (f ) ^ < 1/3, where 6*° = Y.f=i l^iT- Then 

1 



(1 + V') 



M ' 



(29) 



Proof: ^ = ^'=1":'''' = Oi^^+Xi, where Si and S2 are i.i.d. S{a, 1) variables. When Xi = 0, ^ = 9^. 



Using the probability bound in Lemma\l\ we obtain 



Si 



Pr > 6, Xj — 0) 



Pr ^ 


Vj 




Sij 



> e, Xj = 



l-Prf|52/5i|°/(^-°) < (e/e) 



q/(1-q) 



M 



n M 



[Pr{\S2/Si\>e/e)] 



M 



(1-F«(^))^^< (1 



i + i/V' 



M 



(i + V') 



M 



□ 



The assumption = {§) < 1/3 is reasonable for small a because ~ ^ ~ 1/-?^- 



6.2.2 Required Number of Measurements 

We derive the required M, number of measurements, based on the false positive probability in Theorem [T] 
This complexity result is useful if we just use one iteration, which matches the known complexity bounds 
in the compressed sensing literature. 



Theorem 2 To ensure that the total number of false positives is bounded by 6, it suffices to let 

log {{N - K)/5) 



M > 



log(l + '0) 



Since 



1 /K and 1 / log(l + ijj) ~ K, we define 

Mq = K\og{{N - K)/5) 



(30) 

□ 

(31) 



as a convenient approximation. Note that the parameter e affects the required M only through e°. This means 
our algorithm is not sensitive to the choice of e. For example, when a = 0.03, then (10"^)°^ = 0.8128, 
(10-"^)" = 0.7586. If we can afford to use very small a like 0.001, then (lO-iO)O OOi = 0.9772 and 
(10-40)0.001 ^ 0.912. 
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6.2.3 Analysis of False Negatives 



Theorems Ifa< 0.05, 



a/(l-Q) 



< 1/3, then 



Prflfi 



< e, \xi\ > e) < < 1 



l + Q 



\xi\ + e 



a/(l-Q)' 



M' 



Proof: 



1 - 


1 -Pr 1^ 




< 






Sij 





< e,\xi\ > e 



M 



Again, we can write ^ = + Xi. By symmetry, 



Pr 


( 








Sij 










F 


-( 







< e, > e = Pr {{\xi\ - e)/^^ < S2/S1 < i\xi\ + e)/9i) 



ml + e 



o/(l-a)' 



Fa 



Xi - e 



(32) 



The result follows from the probability bounds, Fa{t) < 1.5*1+° and Fa{t) — Fa{s) < (1 — s/t)Fa{t). □ 
6.2.4 The Choice of Threshold e 

We can better understand the choice of e from the false negative probability as shown in Theorem |3] Assume 
7^ and \xi\/€ = Hi ^ I, the probabihty Pr (|xi,mm| < l^^i] > e) upper bound is roughly 



Hi -I 



Hi + 1 



M 



3/4 2a 



M 



1 - e 

aM , alog{{N-K)/5) 



_ 3/2aM 
KHi 



As we usually choose M < Mq = Klog{{N - K)/5), we have ^ < _ ensure that all 

the K nonzero coordinates can be safely detected by the minimum estimator (i.e., the total false negatives 
should be less than 5), we need to choose 



1 



< 



f^^H. ^ 1.5a log 



(33) 



For sign signals, i.e., |xj| = 1 if Xi 7^ 0, we need to have Hi > l.baK log ^ /S, or equivalently 
e < f N^K ■ If K = 100 (or 1000), it is sufficient to let e = 10"^ (or 10"^). Note that even with 

1.5aK log — J — 

N = 2^2 (and 6 = 0.01), log{N/6) = 26.8 is still not large. 

For general signals, when the smallest Hi dominate X^x^o W' essentially just need the smallest 
Hi > 1.5q log /S, without the K term. In our experiments, for simplicity, we let e = 10~^, for both 
sign signals and Gaussian signals. 

Again, we emphasize that this threshold analysis is based on the minimum estimator, for the first iteration 
only. With the gap estimator and the iterative process, we find the performance is not sensitive to e as long 
as it is small, e.g., 10"^ to 10~^. 
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6.3 Analysis of the Gap Estimator 

The absolute minimum estimator only detects the locations of nonzero coordinate (in the first iteration). To 
estimate the magnitudes of these detected coordinates, we resort to the gap estimator, defined as follows: 



= Vj/Sij, < 2i,(2) < ■•■ < Zi^(M) 

ji= argmin - 

^<j<M-l 



(34) 
(35) 

(36) 



Although the gap estimator is intuitive from Figure |2l the precise analysis is not trivial. To analyze the 
recovery error probabihty bound of Pr {\xi^gap — Xi\ > e), we first need a bound of the gap probability. 

6.3.1 The Gap Probability Bound 

Lemma 4 Let k > 1, 7 = {1 — a)/a, 1 < cq < 2, ztj = yj/sij, tij = {\zij — Xi\/9i)^^'^, and 
{[1], [2], [M]} a permutation of {1,2, ...,M} giving [^j < p] < ••• < ti,[M]- Then 



Pr 



/ K[k + 2]\ - \Zi,[k + l]\ ^ ^ Fgjt^l]) ^ _ ^^^/^ 



Vk,j,co = min <j n € (0, 1) : cq ( 1 - l^ — 



[2] 



< Vk. 



7.C0 



1 + 



2k 



U \7 



Proof: See Appendix\C\ 



(37) 
(38) 

□ 



Although in this paper we only use %,7,2 cq = 2 and < (cq — l)^/"^ always holds), we 

keep a more general bound which might improve the analysis of the gap estimator (or other estimators) in 
future study. Also, we should notice that as 7 — > 00 (i.e., a — > 0), 77^,7,00 0. 



10" 



10" 



i 10 

o 



10 



10 



10 



-10 



-15 



-20 



-25 



! S 5 s B _ 


s;, a = 0.03 




^^^>J^^^\^« = 0.03 




\ \ \v\ 




a =0.005 \ \ 








a = 0.005\ 



10" 



10' 
k 



10^ 




10^ 10^ 10^ 10^ 10^ 10^ 10® 



Figure 16: ■qk,'y,2 or its upper bound for a = 0.005, 0.01, 0.02, 0.03. For fc < 20 (a = 0.005), fc < 30 (a = 0.01), 
and fc < 50 (a = 0.02, 0.03), we numerically solve for 77*;, -y, 2 from ( |38] |. presented as dashed curves in the left panel. 
For larger k values, we use the upper bound in (|39] l, presented as solid curves. 

As shown in Figure [161 for small k, the constant rj^^^^co can be numerically evaluated. For larger k 
values, we resort to an upper bound which is numerically stable for any k, in the next Lemma. 
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Lemma 5 

f 2k f 2k\ \ 

Vk,^,co < minjn G (0, 1) : logco +7loglog— - 7logfc + log ( 1 + — j < 0| (39) 

Proof: Note that {u/ {2k))^^^ is close to 1 and u/ {2k) close to 0. We apply the inequalities: 1 — t < — logt, 
1 - (1 - < 1/(1 + l/(7t)), VO < t < 1, to obtain 

( (u\^l^\' ( n\7 (\, u\' 1 

CO i-Ur + 1-:T7 <1^co -log— < 



\2k} ) \ 2k} - \k ^2k) -1 + 1/(7^/2/!;) 

□ 

6.3.2 Reconstruction Error Probability 

Theorem 4 Let 7 = (1 — a) /a. Suppose the existence of > 1 and positive integer k^ satisfying 
e {aoko/{M + aoko))"' < e, i.e., aoko < M/Kl^^^, where Kl^^^ = 1/ (e/e)"/^^-") - 1. Then 

Pr {\xi^gap -Xi\>e)< Gm,ki^^^ = min | B{M, aoko/M) + J2 U + ^] %,7,2 > (40) 
where B{M, aoko/M) = Pr {Binomial{M, aoko/M) < ko) is the binomial CDF. 

Proof: The idea is that we can start counting the gaps from ko-th gap, because we only need to ensure that 
the estimated Xi is within a small neighborhood of the true Xi, not necessarily from the first gap. 

We choose q = aoko/M. If Fa {t^ko+i]) < Q' then 1/ (l + l/ti,[fco+i]) < q, or equivalently, tij^o+i] < 
q/ (1 - q). Thus, \xi^gap - Xi\ < BitJ^y^^^^^ < 9 {q/ (1 - q))'^ < e when F^ (tjjfco+i]) < Q ^"'^f 
minfco<fc<A.f-2 (|-2j,[A;+2] ~ -2i.[fc+i] I) > \^i,[2] ~ Thus, the result follows from Lemma^and the fact 

that Pr {Fa (ii.[fco+i]) > 9) < Pr {Binomial {M, q) < ko). □ 



Since ko in Theorem |4] only takes finite values, we can basically numerically evaluate GM,K*g to 
obtain the upper bound for Pr {\xi^gap — Xi| > e). It turns out that, once a and e are fixed, G is only a 
function of K* = K*g ^ and the ratio -j^^. Also, note that K* « iT/e". 

Figure [T7] plots the upper bound Gm,k* ^ for a = 0.005, 0.01, 0.03, and 0.05, in teims of K* and 

For example, when using M = hK* and a = 0.005 / 0.01 / 0.03 / 0.05, the error probabilities are 
0.042 / 0.046 / 0.084 / 0.163. In other word, in order for the error probability to be < 0.05, it suffices to use 
M = hK* if a = 0.005 or 0.01. When a = 0.03 or 0.05, we will have to use respectively M = 7K* and 
M = lOK* measurements in order to achieve error probability < 0.05. 

This way, the required sample size can be at least numerically computed from Theorem H] Of course, 
we should keep in mind that the values are merely the (possibly conservative) upper bounds. 

6.3.3 Connection to the "Idealized" Algorithm 

The "idealized" algorithm analyzed in Sec. |5] assumes a ^ and that, as long as there are two or more 
observations within (xj — e, Xj + e), there will be an algorithm which could perfectly recover (provided 
e is small enough). The gap estimator is a surrogate for implementing the "idealized" algorithm. 
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Figure 17: Numerical values of the upper bound of the error probability Pr {\xi^gap — Xi \ > e) < Gm.k* ^ as 
computed in (l40l i. The labels on the curves are the values of M /K* g ^. For example, when using M = 5/v * and a = 
0.005 / 0.01 / 0.03 / 0.05, the error probabilities are 0.042 / 0.046 / 0.084 / 0. 163. In other word, in order for the error 
probability to be < 0.05, it suffices to use M = 5A'* if a = 0.005 or 0.01. However, if a = 0.03 or 0.05, we will 
have to use respectively M = 7K* and M ~ lOK* measurements in order to achieve error probability < 0.05. 



In the "idealized" algorithm, the probability that x, can not be recovered is 



P^deal = (1 " l/K)'' + (1 - M/K 

which is basically the limit of Pr {\xi^gap — Xi\ > e) in TheoremlH as a — )• 0. Recall % 



M~l 



7iC0 



if a ^ 0. 



We can see from (1401 ) that Gm k 



Pr {Binomial{M, 1/K) < 2), which is exactly pideai- 



6.3.4 Practicality of the Gap Estimator 

The gap estimator is practical in that the enor probability bound (l40l ) holds for any a and e. This property 
allows us to use a finite a (e.g., 0.03) and very small e. Note that K* is basically iT/e". Even if we have 
to choose e = 10^^°, i.e., (10~10)0 03 = 0.5, we can still recover Xi by using twice as many examples 
compared to using a — )■ 0. 

The analysis of the "idealized" algorithm reveals that M = 5K to 7K might be sufficient for achieving 
perfect recovery. In our simulation study in Sec. |4] we find M = Mo/3 is good enough for our practical 
algorithm for a range of (M, K) values. Perhaps not surprisingly, one can verify that the values of Mq/3 
roughly fall in the 5K ~ 7K range for those values of (M, K). This, to an extent, implies that the perfor- 
mance of our practical procedure, i.e., Alg. [T]can be close to what the "idealized" algorithm could achieve, 
despite that the theoretical probability upper bound Gm,k* niight be too conservative when a is away from 
zero. 
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7 Measurement Noise 



It is intuitive that our method is robust against measurement noise. In this paper, we focus on exact sparse 
recovery. In the compressed sensing literature, the common model is to assume additive measurement 
noise y = xS + n, where each component Uj is the random noise, which is typically assumed to be rij ~ 
Normal (O, a'^Ny A precise analysis will involve a complicated calculation of convolution. 



7.1 Additive Noise 

To provide the intuition, we first present a set of experiments with additive noise in Figures [T8]tol2T] 

With N = 100000, K = 30, and M = Mq (i.e., C = 3), we have seen in the simulations in Sec.Hthat all 
methods perform well, in both Sign and Gaussian signals. When we add additive noises with cj = 0.1 to the 
measurements. Figure [T8]and Figure [T9]show that our proposed method still achieves perfect recovery while 
LP and OMP fail. When we add more measurement noise by using a = 0.5 in Figure |20]and Figure |2T] we 
observe that our method again achieves perfect recovery while both OMP and LP fail. 
To understand why our method is insensitive to measurement noise, we can examine 

'-^=^. + 0,^ + ^ (41) 

Sij Dl Ol 

Without measurement noise, our algorithm utilizes observations with <S'2/5'i to recover Xj, i.e., either 
Si is absolutely very large, or 5i is large only relative to 5*2. Because Si is extremely heavy-tailed, when 
S2/S1 « 0, it is most likely |5i| is extremely large in the absolute scale. When 5i is small, ^ will be large 
but likely ^ will be large as well (i.e., the observation would not be useful anyway). This intuition explains 
why our method is essentially indifferent to measurement noise. 



7.2 Multiplicative Noise 

An analysis for multiplicative noise turns out to be easy and should also provide a good insight why our 
method is not sensitive to measurement noise. For convenience, we consider the following model: 

N 

Vj = PjVj = Pj = Pj^^ij^i^ Pi > 0, i = 1,2, ...,M (42) 

i=l 

where pj's are assumed to be constants, to simplify the analysis. For example, when pj = 5 (or 1/5), this 
means the measurement yj is magnified (or shrunk) by a factor of 5. We assume that we still use the same 
minimum estimator Xi^min = Vt/su, where t = argmin^ yj/sij. 

Lemma 6 Assume ^/p]~" < 1/3, where ^p = and 9"^ = YZ^i l^^il"- Then 

M ^ 

Pr > 6,x. = 0) < n ( • (43) 



Proof: The proof is analogous to the proof of Theorem\J\ □ 

Note that p^"/(i-") ^ 1 even for large (or small) pj values. In other words, the false positive error 
probability is virtually not affected by the measurement noise 
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Figure 18: Reconstruction results from one simulation, with N = 100000, K = 30, AI = A/o (i.e., C = 1), cr = 0.1, 
and sign signals. With the proposed method, the signal is perfectly reconstructed in one iteration. In comparisons, 
both OMP and LP perform very poorly. 
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Figure 19: Reconstruction results from one simulation, using N = 100000, K = 30, M = Mq (i.e., C, — V), a — 0.1, 
and sign signals. Our method (using just one iteration) can still perfectly reconstruct the signal. 
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Figure 20: Reconstruction results from one simulation, with N = 100000, K = 30, AI = AI^ (i.e., C = 1), cr = 0.5, 
and sign signals. Using our method, the signal is perfectly reconstructed with one iteration. In comparisons, both 
OMP and LP perform poorly. 
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Figure 21: Reconstruction results from one simulation, using N = 100000, K = 30, M = Mq (i.e., C, — V), a — 0.5, 
and sign signals. Our method (using just one iteration) can perfectly reconstruct the signal. 
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8 Combining LO and L2 Projections 



There have been abundant of studies of compressed sensing using the Gaussian design matrix. It is a natural 
idea to combine these two types of projections. There are several obvious options. 

For example, suppose we can afford to use M = Mq = Klog{{N — K)/5) measurements to detect all 
the nonzero coordinates with essentially no false positives. We can then use additional K (or slightly larger 
than K) Gaussian measurements to recover the magnitudes of the (candidates of) nonzero coordinates via 
one least square. This option is simple and will require Mq + K total measurements. 

We have experimented with another idea. First, we use M = Mq/A measurements and Alg. [T]with 
gap estimators and iterations. Because the number of measurements may not be large enough, there will be 
a small number of undetermined coordinates after the procedure. We can apply additional K/2 Gaussian 
measurements and the LP decoding on the set of the undetermined coordinates. We find this approach also 
produces excellent recovery accuracy, with about Mq/A + K/2 total measurements. 



Here, we present some interesting experimental results on one more idea. That is, we use M = Mq/2 
measurements and hence there will be a significant number of false positives detected by the minimum 
estimator Xi^min- Instead of choosing a threshold e, we simply take the top-T coordinates ranked by 



Fi,mm|- We then use additional T Gaussian measurements and LP decoding. In Figure |22J we let T = 
1.5K,2K,2.5K,3K,3.5K,AK. When N = 10000 and K = 50, even using only T = 1.5K addi- 
tional Gaussian measurements produces excellent results. When N = 100000, we need more( in this case 
T = 2K) additional measurements, as one would expect. 
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Figure 22: Median reconstruction errors. We first apply M = A/o/2 (i.e., C = 2) measurements and the minimum 
estimator Xi^min to detect the nonzero coordinates and then select the top-T coordinates ranked by |a;i,mi„|, where 
T = 1.5K, 2K, 2.5K, 3K, 3.5K, AK, without using the threshold e. We use additional T Gaussian measurements and 
apply the LP decoding on these top-T coordinates. 
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9 Future Work 



We anticipate that this paper is a start of a new line of research. For example, we expect that the following 
research projects (among many others) will be interesting and useful. 

1 . Very sparse LO projections. Instead of using a dense design matrix S with entries sampled from 
i.i.d S{a, 1), we can use an extremely sparse matrix by making (e.g.,) 99.9% or more entries be zero. 
Furthermore, we can sample nonzero entries from a symmetric a-pai^eto distribution, i.e., a random 
variable Z with Pr (jZj > t) = ^. These efforts will significantly speed up the processing and sim- 
plify the hardware design. This is inspired by the work on very sparse stable random projections |[T6l . 

2. Correlated projections. It might be possible to use multiple projections with different a values to 
further improve the performance of sparse recovery. Recall that we can generate different (and highly 
"correlated") a-stable variables with the same set of uniform u and exponential w variables as in (|2j. 
For example, there is a recent work on using correlated stable projections for entropy estimation lITOl . 

10 Conclusion 

Compressed sensing has been a highly active area of research, because numerous important applications 
can be formulated as sparse recovery problems, for example, anomaly detections. In this paper, we present 
our first study of using LO projections for exact sparse recovery. Our practical procedure, which consists 
of the minimum estimator (for detection), the gap estimator (for estimation), and the iterative process, is 
computationally very efficient. Our algorithm is able to produce accurate recovery results with smaller 
number of measurements, compared to two strong baseUnes (LP and OMP) using the traditional Gaussian (or 
Gaussian-like) design matrix. Our method utilizes the a-stable distribution with a 0. In our experiments 
with Matlab, in order for interested readers to easily reproduce our results, we take a = 0.03 and find no 
special storage structure is needed at this value of a. Our algorithms are robust against measurement noises. 
In addition, our algorithm produces stable (partial) recovery results with no catastrophic failure even when 
the number of measurements is very small (e.g., M f« K). 

We also analyze an "idealized" algorithm by assuming a — )• 0. For a signal with K = 2 nonzero 
coordinates, merely 3 measurements are sufficient for exact recovery. For general K, our analysis reveals 
that about 5K measurements are sufficient regardless of the length of the signal vector. 

Finally, we anticipate this work will lead to interesting new research problems, for example, very sparse 
LO projections, correlated projections, etc, to further improve the algorithm. 
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A Proof of Lemma [I 



To show, Vt > 0, 

Fa{t)=E 

where 



> max 



1 + Qa/t 

F,(t) = Prf|S2/Sir/(i-") 



O ^ 

W2 



1/2 l + (l/t-3)Pr(Q^ < t)/2 ) 
l + l/t' l + l/t 



<t] , 



Qa = Qa{ui,U2) 



2a{u2) 



qa{ui) 



a/{l-a) 



Proof: Firstly, since wi and W2 are independent e2;p(l) variables, we have 



1 



Note that Pr {Qa < 1) = 1/2, and 



> 



1 



l + z/t 



dPr{Qa <z)> 



1 + 

Pr {Qa < 1) 1/2 



l + l/t l + l/t 



For the other bound, we let X = logQo,, which is symmetric about 0. This way, we can write Fa{t) = 
^t+e^- ^^^^ "^hat -j^pr is convex when X > logt (and hence Jensen's inequality applies). For now, we 
assume < i < 1 (i.e., log t < 0) and obtain 



Fa{t) =E 

>E 
>E 



t 



t + e^ 
t 



E 



t 



t + e^ 
t 



t + e^ 
I{X < logt) }+E 



I{X < logt) }+E 



t + e 



X 



I{X>logt) 



t 



t + t 
=Pr{Qa < t)/2 + 



Pr(ga <t)\ + 



t + e^ 
tPr{\X\ < |logt|) 



I{\X\ < llogtl 



t + e^{x|/(|x|<|iogt|)} 
t(l -2Pr(g„ < t)) 



_ l + (l/t-3)Pr(Qa <t)/2 
l + l/t 

1 1/2 

In particular, when < t < 1/3, we have Fa{t) > j^^- Also, note that when t > 1, j^fji is sharper than 
the other bound. 

To prove, for any fixed t > 0, IhHa^Q Fa{t) = Yqj^. we just need to use dominated convergence 
theorem and the fact that Qa 1 point-wise. This completes the proof. 
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B Proof of Lemma |2] 



To show, if < a < 1/3, then 



where 



Fait) <Cat^+^ max{l,ti+"} 



vr \ a J \1 — q; 

_ 1 r (1/(2 - 2a)) r ((1 - 3a)/(2 - 2a)) 
~ ^ r((2-3a)/(2 -2a)) 



/i2 = 1/ COS (7ra/(2 — 2a)) 
Proof: We need to first find a good lower bound of Qa, where 



Qc 



qa{u2) 



y{ui) 



a/{l-a) 



I sin(au2)| 



a/(l-a) 



cos(u2 — an2) cos 



1/(1-") 



Ul 



I sin(aiii)|"/(-^ ") cos(ui — aui) cos-^/(^ n2 
We will make use of the following inequalities, when \u\ < 7r/2, 

a| sinn| < I sinan| < a|n|, cos(u — ati) > cos u, | sin(an)| < a| tan(n) 
cos(ti — an) cos(au) + tan(ti) sin(an) ^ l + a|tanu| 



(cosM)V(i-a) 



cos u 



)Q/(l-a) 



(cos «)"/(!-") 



To see | sin(an)| < a| tan(n)|, consider, Vn € [0, vr/2], (a tann — sin(aii))' = a sec^ u — a cos(au) > 0. 
We can bound Qa as follows: 



Qa > 



sm n2 



tan til 



(cosni)°/(i-") 



1 



l + a| tan till (cos n2)"/(i"°^) 



tan U2 cos ui 



tan til 



"/{1-") (cosn,)"/(i-°) 
1 + a| tan ui| 



> 





tan U2 


a/{l-a) 




tan tti 





(1 + a|tanni|)(l + tan2ui)"/(2-2a) 
1 



(l+a|Xi|)(l+X2)"/(2-2a) 1X1X2 



a/(l-a) 



Because ui and M2 are i.i.d. um/(— 7r/2, 7r/2), we know that Xi = tanui and X2 = 1/ tanM2 are 
i.i.d. standard Cauchy variables. Therefore, 



Fait) =E 



<E 



1 + Qa/t 



< E 



(l + a|Xi|)(l + X2)-/(2-2-)|XiX, 



\a/(l~a) 



(l + a|Xi|)(l + X2)"/(2-2") IX1X2I 
1 



(1 + a|Xi|)(l + X2)°/(2-2a) 



,(l + a|Xi|)(l + X2)"/{2-2a) |Xi 



+ l/(i/"2) 



(Jensen's Inequality) 
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where ^2 = E\X2\°'^^^ Note that is concave in x. Furthermore, for any 2; > 0, 

(1 +ax)(l +x2)°/(2-2«)a;"/(i-") \ 2/7r 



"^^^-io V(l + ax)(l + x2)"/{2-2a)W(l-°) + 1/(^^*2) J 1 + 
Jz i- + X Jq 
2; Jo 1 + X2 



(ix 



}} 



<^+t;,J^i + - r (i+xr/(i--)-idx 

2; [ vr L 

=t//2/xi + ^ + ^^2(1 - a) {(1 + z2)"/(i-") _ ijj 

<t^i2/Ui + -|- + t^*2(l-a)z2"/a-")\, if-^ < 1 
TT [z J 1 — a 

where m = E {Xf + xf)"/^^"^"). The next task is to find the z which minimizes this upper bound. Note 
that m^in{2/z + rz2°/(i-°) } = {^) ^ [j^) ' attained at z = (^) Thus, we obtam 

2a 

1 1-Q / 1 — a\ / 1 + q: 

<t/i2/ii + -(tAi2(l-a))i 



where, using integral formulas llT4l 3.622.1,3.624.2] (assuming a/(l — a) < 0.5) 

2 /■"/2 



/^2 



TT 



£;|X2|"/(^-") = - / tan"/(^-")'udn= l/cos(7ra/(2-2Q)) 







= ^ fx2 + x^r^^'"'"^ = - tan°/(i'°)n ^^ ^ 1 F (1/(2 - 2a)) T ((1 - 3a)/(2 - 2a)) 



TT 7o cos"/{i-")ti TT r((2-3a)/(2 -2a)) 

Therefore, we can write 

l-Q 2a 

Fa{t) <Cat^+^ max{l,ti+c-} 

where 

2a 

Ca = ^1^2 + - (M2(1 - a)) 1+ 



vr \a/ \1 — a 

Moreover, Cq — > 1 + l/vr as a — > 0, Cq < 1.5 if a < 0.05, and Cq < 2 if a < 0.16. This completes the 
proof. 
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C Proof of Lemma m 

Let A: > 1, 7 = (1 - a)/a, 1 < cq < 2, Zi,j = yj/sij, = {\zij - Xi\/9if/^, and {[1], [2], [A/]} a 
permutation of {1, 2, ...,M} giving < [2] ^ ••• ^ ^i,[M]- To show 



pJ\3it±3lzhit±}A < 1, 1^ < _ 1)1/7 ) < ( 1 + 



..,.,eo = minjn e (0, 1) : co (l - (J)''')' + (l - < l} 



Proof: Suppose Fa(ti,[i])/Fa(ti,[2]) < (co - 1)^^^- Recall that Fa{t) = with > 0. Since 

Fa{s) / Fa{t) < s/t for s < t, we know that tj > (cq — l)iq2]' ^'^'^ hence 

ki,[2] - \ = \Zi,[2] -Xi- Z,^i] +Xi\ < |Zij2] - Xi\ + \z,^i] -Xi\=9i (^t^^] + ^ ^iC0*^[2] 

Because |zijfc+2]l - < ki,[fe+2] - ^^ijfc+i] | < |zijfc+2] - a^i I + l^^ijfc+i] - it follows that 

\^i,[k+2] \ - \Zi,[k+l] \ < ki,[2] - ^ *i^[fe+2] ~ *l[k+l] - ^0il^2] 

Again, using Fa{s) / Fa{t) < s/t for s < t and cq > 0, we obtain 

F2 (ij,[fc+2]) *i![A:+2] 

Therefore, we have 

Fi,[2] — 

Consider a > 0, 6 > 0, a'^ + b'^'co = 1. We have 

Pr {FJ + coFj (t,,[2]) > (ti,[fc+2])} 

<Pr {FJ > a^Fj (t,,[fc+2]) } + Pr {cqFJ (t,,[2]) > 6^coFj (t,,[,,+2]) } 

=Pr {F, /F, (ti,[fc+2]) > a} + Pr {F, (t.^p]) /F„ (ti,[fc+2]) > &} 



Note that Fq, (i^jj]), J = 1, 2, M ai^e order statistics of M uniform variables in unif{0, 1). This means 
Fa {ti,[j]) /Fa {ti.[k+2]) has the Beta {j, k + 2 — j) distribution. Thus 

Pr [Fa (ii,[fc+l]) /Fa (ti,[fc+2]) > «) 



Pr {Fa (t,,[2]) /i^a (ti,[fe+2]) > &) 

^ ^ x^l - x)''-^dx = ik + l){k) [ (1 - x){x)''-^dx = (1 - + kb) 
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Combining the results, we obtain 

Pr 



k 

k + 1 



We choose a = l — 6=1— {"^Y^^ , where 



Therefore, 



% = %,7,co = min £ (0, 1) : cq ^1 - (^^^ ^ ^ + (^1 



/ \Zi,[k+2] \ - \Zi,[k+l]\ ^ ^ Fa{ti^[i]) ^ _ 
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m _ Vi _ Vk fr]k\ 1/*= 
2k 2k{k + l) 2 \2k) 



k + l\ \ 2kJ 

<^^ffc + l-^U^ 
-A; + 12A;V 2k/ 2k 



III- 

--Vk + 



This completes the proof. 
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